feat: increase default ReadinessMaxBlocksBehind from 3 to 30 #2779

tac0turtle · 2025-10-23T11:31:23Z

This PR increases the default ReadinessMaxBlocksBehind value from 3 to 30 blocks to avoid false-negatives during normal p2p sync operations.

Problem

The current default of 3 blocks marks most healthy nodes as "not ready" because they are routinely >3 blocks behind best-head while the p2p sync executes blocks in batches. Observability tools that scrape /health/ready every 5s therefore report the node as permanently offline.

Solution

Raised the default to 30 blocks (≈ 15s worth of blocks with 0.5s block time). This keeps the safety property ("stuck node is detected") while removing the false-positive during normal batch-sync.

Changes

Updated pkg/config/defaults.go to change ReadinessMaxBlocksBehind from 3 to 30

Fixes #2778

Generated with Claude Code

The previous default of 3 blocks caused false-negatives during normal p2p sync operations, where nodes routinely lag by 6-7 seconds (12-14 blocks at 500ms block time) while executing blocks in batches. This change raises the default to 30 blocks (15s worth of blocks with 0.5s block time), which maintains the ability to detect stuck nodes while eliminating false-positives during normal batch-sync operations. Fixes #2778 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-authored-by: Marko <tac0turtle@users.noreply.github.com>

claude · 2025-10-23T11:31:45Z

Claude finished @tac0turtle's task —— View job

Comprehensive Code Review

I've completed a thorough review of PR #2779. Below are my findings across code quality, architecture, testing, security, and performance dimensions.

🎯 Summary

This PR successfully makes the ReadinessMaxBlocksBehind threshold dynamic and configurable. The implementation is sound, but there are several improvements needed for production readiness.

Overall Assessment: ✅ Approve with recommended changes

🔍 Critical Issues

1. Missing Unit Tests for Core Calculation Logic ⚠️ HIGH PRIORITY

Location: pkg/config/defaults.go:42-50

The calculateReadinessMaxBlocksBehind() function has zero test coverage. This is a critical gap because:

The function contains edge case handling (zero values)
The calculation directly affects health check behavior
Codecov shows only 57.14% patch coverage with 6 lines missing

Recommendation:

// Add to pkg/config/defaults_test.go
func TestCalculateReadinessMaxBlocksBehind(t *testing.T) {
    tests := []struct {
        name          string
        blockTime     time.Duration
        windowSeconds uint64
        expected      uint64
    }{
        {
            name:          "default_1s_blocktime_15s_window",
            blockTime:     1 * time.Second,
            windowSeconds: 15,
            expected:      15,
        },
        {
            name:          "testnet_500ms_blocktime_15s_window",
            blockTime:     500 * time.Millisecond,
            windowSeconds: 15,
            expected:      30,
        },
        {
            name:          "slow_chain_2s_blocktime_15s_window",
            blockTime:     2 * time.Second,
            windowSeconds: 15,
            expected:      7,
        },
        {
            name:          "custom_30s_window",
            blockTime:     1 * time.Second,
            windowSeconds: 30,
            expected:      30,
        },
        {
            name:          "zero_blocktime_fallback",
            blockTime:     0,
            windowSeconds: 15,
            expected:      30, // fallback value
        },
        {
            name:          "zero_window_fallback",
            blockTime:     1 * time.Second,
            windowSeconds: 0,
            expected:      15, // 15s default / 1s = 15
        },
        {
            name:          "both_zero_fallback",
            blockTime:     0,
            windowSeconds: 0,
            expected:      30, // blockTime checked first
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            result := calculateReadinessMaxBlocksBehind(tt.blockTime, tt.windowSeconds)
            assert.Equal(t, tt.expected, result)
        })
    }
}

2. Integer Division Truncation - Potential Incorrect Behavior ⚠️ MEDIUM PRIORITY

Location: pkg/config/defaults.go:49

return uint64(time.Duration(windowSeconds) * time.Second / blockTime)

Issue: Integer division truncates remainders, which can lead to unexpected results:

blockTime = 700ms, windowSeconds = 15 → 15000ms / 700ms = 21.42... → truncates to 21 blocks
Expected: ~22 blocks (rounding up for safety)

Example Impact:

700ms block time: 21 blocks ≈ 14.7s (under the 15s window)
333ms block time: 45 blocks ≈ 14.985s (under the 15s window)

Recommendation: Add rounding for better accuracy

// Option 1: Round to nearest
blocks := uint64((time.Duration(windowSeconds)*time.Second + blockTime/2) / blockTime)

// Option 2: Round up (more conservative)
blocks := uint64((time.Duration(windowSeconds)*time.Second + blockTime - 1) / blockTime)

// Option 3: Document truncation behavior in comment
// calculateReadinessMaxBlocksBehind calculates how many blocks represent the readiness window
// based on the given block time and window duration in seconds. The result is truncated (not rounded)
// meaning the actual time window may be slightly less than windowSeconds.

3. Potential Configuration Mismatch at Runtime ⚠️ MEDIUM PRIORITY

Location: pkg/config/defaults.go:56-71

The ReadinessMaxBlocksBehind is calculated once at startup using the default blockTime. However:

Users can override blockTime via CLI flags or YAML config
The calculation happens before config overrides are applied
This creates a mismatch between the actual block time and readiness threshold

Example Scenario:

# User's config file
node:
  block_time: "500ms"  # User overrides to 500ms
  readiness_window_seconds: 15

What happens:

DefaultConfig() calculates: ReadinessMaxBlocksBehind = 15s / 1s = 15 blocks
User's block_time: 500ms overrides default 1s
Result: 15 blocks × 500ms = 7.5s window (not 15s as intended!)

Recommendation: Calculate dynamically when needed, or recalculate after config load

// Option 1: Calculate in readiness handler (pkg/rpc/server/http.go:66)
allowedBlocksBehind := cfg.Node.ReadinessMaxBlocksBehind
if allowedBlocksBehind == 0 && cfg.Node.ReadinessWindowSeconds > 0 {
    // Auto-calculate if not explicitly set
    allowedBlocksBehind = uint64(time.Duration(cfg.Node.ReadinessWindowSeconds) * time.Second / cfg.Node.BlockTime.Duration)
}

// Option 2: Recalculate after config loading in Load() function
// In pkg/config/config.go after line 501, before returning cfg:
if cfg.Node.ReadinessMaxBlocksBehind == 0 || 
   (cfg.Node.ReadinessWindowSeconds > 0 && cfg.Node.BlockTime.Duration > 0) {
    cfg.Node.ReadinessMaxBlocksBehind = calculateReadinessMaxBlocksBehind(
        cfg.Node.BlockTime.Duration, 
        cfg.Node.ReadinessWindowSeconds,
    )
}

📋 Code Quality Issues

4. Unclear Configuration Semantics - MEDIUM PRIORITY

Location: pkg/config/config.go:203-204

Both ReadinessWindowSeconds and ReadinessMaxBlocksBehind can be set by users, creating ambiguity:

Which one takes precedence?
What happens if both are set?
What happens if only one is set?

Current behavior (undocumented):

If user sets ReadinessMaxBlocksBehind explicitly, it uses that value
ReadinessWindowSeconds is only used during DefaultConfig() calculation
No recalculation happens if block time changes

Recommendation: Clarify precedence in documentation

// ReadinessWindowSeconds is used to calculate ReadinessMaxBlocksBehind automatically.
// If ReadinessMaxBlocksBehind is explicitly set (non-zero), it takes precedence over
// the calculated value. Set ReadinessWindowSeconds to adjust the dynamic threshold,
// or set ReadinessMaxBlocksBehind directly to use a fixed block count.
ReadinessWindowSeconds   uint64 `mapstructure:"readiness_window_seconds" yaml:"readiness_window_seconds" comment:"Time window in seconds used to calculate ReadinessMaxBlocksBehind based on block time. Only used if ReadinessMaxBlocksBehind is not explicitly set. Default: 15 seconds."`
ReadinessMaxBlocksBehind uint64 `mapstructure:"readiness_max_blocks_behind" yaml:"readiness_max_blocks_behind" comment:"How many blocks behind best-known head the node can be and still be considered ready. If 0, calculated automatically from ReadinessWindowSeconds and BlockTime. Set explicitly to override auto-calculation."`

5. Hardcoded Magic Number - LOW PRIORITY

Location: pkg/config/defaults.go:44

return 30 // fallback to safe default if blockTime is not set

Issue: The fallback value 30 should be a named constant for maintainability.

Recommendation:

const (
    DefaultReadinessWindowSeconds = 15
    FallbackReadinessMaxBlocksBehind = 30
)

func calculateReadinessMaxBlocksBehind(blockTime time.Duration, windowSeconds uint64) uint64 {
    if blockTime == 0 {
        return FallbackReadinessMaxBlocksBehind
    }
    if windowSeconds == 0 {
        windowSeconds = DefaultReadinessWindowSeconds
    }
    return uint64(time.Duration(windowSeconds) * time.Second / blockTime)
}

🧪 Testing Issues

6. Test Coverage Gaps - MEDIUM PRIORITY

Missing test scenarios:

Config loading with override behavior - Not tested
- Load config with block_time: 500ms and verify ReadinessMaxBlocksBehind calculation
- Verify explicit ReadinessMaxBlocksBehind is not overridden
Edge case in DefaultConfig - Not tested
- What if DefaultConfig() is called in a context where calculation should happen differently?
Integration test for readiness endpoint - Partially tested
- Current tests in server_test.go don't verify the dynamic calculation
- Tests use hardcoded expectations (15 blocks) but don't verify it matches blockTime * windowSeconds

Recommendation:
Add test in pkg/config/config_test.go:

func TestReadinessCalculationWithConfigOverride(t *testing.T) {
    // Test that changing block_time doesn't automatically recalculate ReadinessMaxBlocksBehind
    cfg := DefaultConfig()
    cfg.Node.BlockTime = DurationWrapper{500 * time.Millisecond}
    
    // Current behavior: ReadinessMaxBlocksBehind stays at 15 (from 1s default)
    assert.Equal(t, uint64(15), cfg.Node.ReadinessMaxBlocksBehind)
    
    // Expected window is now 7.5s, not 15s
    actualWindow := time.Duration(cfg.Node.ReadinessMaxBlocksBehind) * cfg.Node.BlockTime.Duration
    assert.Equal(t, 7500*time.Millisecond, actualWindow) // 7.5s, not 15s!
}

📚 Documentation Issues

7. Missing User-Facing Documentation - LOW PRIORITY

Issue: The relationship between ReadinessWindowSeconds and ReadinessMaxBlocksBehind is not explained for operators.

Recommendation: Add documentation in CLAUDE.md or a dedicated config docs file:

### Readiness Health Check Configuration

The `/health/ready` endpoint considers a node ready when it's within a certain number of blocks 
behind the best-known network height. This threshold is configurable:

#### Option 1: Time-based (Recommended)
Set `readiness_window_seconds` (default: 15 seconds). The node calculates the block threshold 
based on your `block_time`:
- 1s block time → 15 blocks
- 500ms block time → 30 blocks
- 2s block time → 7 blocks

#### Option 2: Block-based
Set `readiness_max_blocks_behind` directly to use a fixed block count regardless of block time.

#### Precedence
If you set `readiness_max_blocks_behind` explicitly, it overrides the time-based calculation.

⚡ Performance & Efficiency

8. Calculation Called on Every Request - INFO

Location: pkg/rpc/server/http.go:66

allowedBlocksBehind := cfg.Node.ReadinessMaxBlocksBehind

Observation: Currently reads from config struct on every request. This is fine for current implementation, but if you implement dynamic calculation (Issue #3), consider caching.

Not an issue now, but note for future:

// If you implement dynamic calculation, cache the result:
var readinessThreshold atomic.Uint64

// Recalculate only when config changes
func updateReadinessThreshold(cfg config.Config) {
    threshold := calculateReadinessMaxBlocksBehind(
        cfg.Node.BlockTime.Duration, 
        cfg.Node.ReadinessWindowSeconds,
    )
    readinessThreshold.Store(threshold)
}

✅ What's Done Well

Clear separation of concerns - Calculation logic is isolated in a dedicated function
Backward compatibility - Existing behavior is preserved (15 blocks for 1s block time)
Edge case handling - Zero value fallbacks prevent panics
Test updates - Existing tests were updated to match new defaults
Flag naming - Clear, descriptive flag names with good help text
Comments - Function has clear docstring explaining purpose

🎯 Recommended Action Items (Priority Order)

HIGH: Add unit tests for calculateReadinessMaxBlocksBehind() (Issue ABCI interface #1)
HIGH: Fix or document integer division truncation behavior (Issue Peer discovery #2)
MEDIUM: Address config mismatch at runtime (Issue Transaction gossiping #3)
MEDIUM: Clarify configuration precedence in comments (Issue Fraud proofs gossiping #4)
MEDIUM: Add test for config override behavior (Issue Fraud proof generation #6)
LOW: Extract magic number to named constant (Issue Mempool #5)
LOW: Add user-facing documentation (Issue Rollback of invalidated transactions #7)

📊 Code Metrics

Lines changed: +26/-5 across 4 files
Test coverage: 57.14% (6 lines missing coverage)
Complexity: Low (single calculation function)
Breaking changes: None
API changes: Additive only (new config field)

🔐 Security Considerations

✅ No security issues identified. The changes are configuration-only and don't introduce new attack surfaces.

🏁 Final Verdict

Approve with changes recommended. The core implementation is solid, but production readiness requires:

Comprehensive unit tests for the calculation logic
Clear documentation of configuration semantics and precedence
Addressing the config mismatch scenario (Issue Transaction gossiping #3)

The PR solves the stated problem (false positives during batch sync), but the implementation has room for improvement in handling edge cases and user configuration scenarios.

• Branch: claude/issue-2778-20251023-1120

github-actions · 2025-10-23T11:31:46Z

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Oct 27, 2025, 9:33 AM

auricom · 2025-10-23T11:36:12Z

@claude instead of hardcoding a value, please use a dynamic one based of config.blocktime
result should be the equivalent of 15seconds

claude · 2025-10-23T11:36:33Z

Claude finished @auricom's task —— View job

Implementation Complete ✅

Successfully made ReadinessMaxBlocksBehind dynamic based on config.blocktime to represent 15 seconds worth of blocks.

Task List

Read config files to understand blocktime configuration
Determine the appropriate approach for dynamic calculation
Implement the calculation (15s / blocktime)
Update tests to reflect the change
Commit and push changes

Implementation Details

Dynamic Calculation:

Created calculateReadinessMaxBlocksBehind() helper function in pkg/config/defaults.go
Formula: 15 seconds / blockTime = blocks behind threshold
Includes fallback to 30 if blockTime is 0 (edge case protection)

Scaling Behavior:
The threshold now scales automatically with different block times:

0.5s block time → 30 blocks (same as before for testnet scenarios)
1.0s block time → 15 blocks (new default)
2.0s block time → 7 blocks (scales appropriately)

This ensures the 15-second window for normal batch-sync latency is maintained regardless of the configured block time, while still detecting stuck nodes.

Test Updates:
Updated pkg/rpc/server/server_test.go:

within_3_blocks → within_15_blocks (local: 85, bestKnown: 100)
just_over_3_blocks → just_over_15_blocks (local: 84, bestKnown: 100)

Tests now reflect the new default of 15 blocks for the 1-second block time.

Changed Files

pkg/config/defaults.go: Added calculation function and updated default config
pkg/rpc/server/server_test.go: Updated test cases to match new default

View Job • Branch: claude/issue-2778-20251023-1120

Instead of hardcoding ReadinessMaxBlocksBehind to 30, calculate it dynamically as 15 seconds worth of blocks based on the configured block time. This ensures the threshold scales appropriately: - 0.5s block time → 30 blocks (same as before for testnet) - 1.0s block time → 15 blocks (new default) - 2.0s block time → 7 blocks (scales down appropriately) The 15-second window allows for normal batch-sync latency while still detecting stuck nodes, regardless of block time configuration. Changes: - Added calculateReadinessMaxBlocksBehind() helper function - Updated DefaultConfig() to use calculated value - Updated tests to reflect new default (15 blocks for 1s block time) - Includes fallback to 30 if block time is 0 Co-authored-by: auricom <auricom@users.noreply.github.com>

codecov · 2025-10-23T11:42:48Z

Codecov Report

❌ Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.71%. Comparing base (40cfd71) to head (e121323).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/config/defaults.go	63.63%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2779      +/-   ##
==========================================
- Coverage   62.71%   62.71%   -0.01%     
==========================================
  Files          82       82              
  Lines        7081     7091      +10     
==========================================
+ Hits         4441     4447       +6     
- Misses       2118     2120       +2     
- Partials      522      524       +2

Flag	Coverage Δ
combined	`62.71% <66.66%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

auricom · 2025-10-23T12:05:21Z

@claude do not hardcode 15seconds, make this value part of config so that a user can override our default value

claude · 2025-10-23T12:05:42Z

Claude finished @auricom's task —— View job

Implementation Complete ✅

Successfully made the readiness window configurable instead of hardcoded to 15 seconds.

Task List

Read current implementation to understand structure
Add new config field for readiness window duration
Update defaults to include the new field (15s default)
Modify calculation to use configurable value
Update tests to verify changes
Commit and push changes

Implementation Details

New Configuration Field:

Added ReadinessWindowSeconds to NodeConfig (default: 15 seconds)
Users can now override this value via CLI flag or YAML config:
- CLI: --evnode.node.readiness_window_seconds=30
- YAML: node.readiness_window_seconds: 30

Dynamic Calculation:
The calculateReadinessMaxBlocksBehind() function now accepts both blockTime and windowSeconds parameters:

Formula: windowSeconds / blockTime = blocks behind threshold
Includes fallbacks for edge cases (0 values)

Behavior Examples:
With default 15-second window:

0.5s block time → 30 blocks
1.0s block time → 15 blocks
2.0s block time → 7 blocks

With custom 30-second window:

0.5s block time → 60 blocks
1.0s block time → 30 blocks
2.0s block time → 15 blocks

Test Updates:

Added test assertion for the new flag
Updated expected flag count from 38 to 39

Changed Files

pkg/config/config.go: Added field definition, CLI flag constant, and flag registration
pkg/config/defaults.go: Updated calculation function and default config
pkg/config/config_test.go: Added test for new flag

View Job • Branch: claude/issue-2778-20251023-1120

Added ReadinessWindowSeconds config field (default: 15s) to allow users to override the time window used for calculating ReadinessMaxBlocksBehind based on block time. This provides flexibility while maintaining the dynamic calculation based on block time. Changes: - Added ReadinessWindowSeconds field to NodeConfig - Updated calculateReadinessMaxBlocksBehind to accept window parameter - Added CLI flag and default value (15 seconds) - Updated tests to verify new configuration Co-authored-by: auricom <auricom@users.noreply.github.com>

auricom · 2025-10-24T10:05:37Z

I tried to deploy it locally to check if that works. but i cannot sync the local node as main is broken, so readiness is always offline.

* main: build(deps): Bump actions/upload-artifact from 4 to 5 (#2788) build(deps): Bump actions/download-artifact from 5 to 6 (#2786) chore!: audit fixes (#2764) refactor(executing): add retries on ExecuteTxs (#2784) feat: increase default ReadinessMaxBlocksBehind from 3 to 30 (#2779)

github-project-automation bot added this to Evolve Oct 23, 2025

github-actions bot assigned tac0turtle Oct 23, 2025

tac0turtle marked this pull request as ready for review October 23, 2025 11:31

tac0turtle requested a review from auricom October 23, 2025 13:33

tac0turtle added blocked and removed blocked labels Oct 25, 2025

Merge branch 'main' into claude/issue-2778-20251023-1120

e121323

julienrbrt approved these changes Oct 27, 2025

View reviewed changes

tac0turtle added this pull request to the merge queue Oct 27, 2025

Merged via the queue into main with commit 15afbee Oct 27, 2025
30 checks passed

tac0turtle deleted the claude/issue-2778-20251023-1120 branch October 27, 2025 13:51

github-project-automation bot moved this to Done in Evolve Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: increase default ReadinessMaxBlocksBehind from 3 to 30 #2779

feat: increase default ReadinessMaxBlocksBehind from 3 to 30 #2779

Uh oh!

tac0turtle commented Oct 23, 2025

Uh oh!

claude bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

auricom commented Oct 23, 2025

Uh oh!

claude bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

auricom commented Oct 23, 2025

Uh oh!

claude bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

auricom commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: increase default ReadinessMaxBlocksBehind from 3 to 30 #2779

feat: increase default ReadinessMaxBlocksBehind from 3 to 30 #2779

Uh oh!

Conversation

tac0turtle commented Oct 23, 2025

Problem

Solution

Changes

Uh oh!

claude bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comprehensive Code Review

🎯 Summary

🔍 Critical Issues

1. Missing Unit Tests for Core Calculation Logic ⚠️ HIGH PRIORITY

2. Integer Division Truncation - Potential Incorrect Behavior ⚠️ MEDIUM PRIORITY

3. Potential Configuration Mismatch at Runtime ⚠️ MEDIUM PRIORITY

📋 Code Quality Issues

4. Unclear Configuration Semantics - MEDIUM PRIORITY

5. Hardcoded Magic Number - LOW PRIORITY

🧪 Testing Issues

6. Test Coverage Gaps - MEDIUM PRIORITY

📚 Documentation Issues

7. Missing User-Facing Documentation - LOW PRIORITY

⚡ Performance & Efficiency

8. Calculation Called on Every Request - INFO

✅ What's Done Well

🎯 Recommended Action Items (Priority Order)

📊 Code Metrics

🔐 Security Considerations

🏁 Final Verdict

Uh oh!

github-actions bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

auricom commented Oct 23, 2025

Uh oh!

claude bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Complete ✅

Task List

Implementation Details

Changed Files

Uh oh!

codecov bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

auricom commented Oct 23, 2025

Uh oh!

claude bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Complete ✅

Task List

Implementation Details

Changed Files

Uh oh!

auricom commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

claude bot commented Oct 23, 2025 •

edited

Loading

github-actions bot commented Oct 23, 2025 •

edited

Loading

claude bot commented Oct 23, 2025 •

edited

Loading

codecov bot commented Oct 23, 2025 •

edited

Loading

claude bot commented Oct 23, 2025 •

edited

Loading